Cross-lingual alignments of ELMo contextual embeddings
نویسندگان
چکیده
Building machine learning prediction models for a specific NLP task requires sufficient training data, which can be difficult to obtain less-resourced languages. Cross-lingual embeddings map word from language resource-rich so that model trained on data the also used in language. To produce cross-lingual mappings of recent contextual embeddings, anchor points between embedding spaces have words same context. We address this issue with novel method creating alignment datasets. Based that, we propose several mapping methods ELMo embeddings. The proposed linear use existing Vecmap and MUSE alignments Novel nonlinear ELMoGAN are based GANs do not assume isomorphic spaces. evaluate nine languages, using four downstream tasks: named entity recognition (NER), dependency parsing (DP), terminology alignment, sentiment analysis. perform very well NER tasks, lower loss compared direct some In DP analysis, variants more successful.
منابع مشابه
A Strong Baseline for Learning Cross-Lingual Word Embeddings from Sentence Alignments
While cross-lingual word embeddings have been studied extensively in recent years, the qualitative differences between the different algorithms remain vague. We observe that whether or not an algorithm uses a particular feature set (sentence IDs) accounts for a significant performance gap among these algorithms. This feature set is also used by traditional alignment algorithms, such as IBM Mode...
متن کاملCross-lingual Wikification Using Multilingual Embeddings
Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wiki...
متن کاملTrans-gram, Fast Cross-lingual Word-embeddings
We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data. We use our new method to compute aligned wordembeddings for twenty-one languages using English as a pivot language. We show that some linguistic features are aligned across lang...
متن کاملCross-lingual Models of Word Embeddings: An Empirical Comparison
Despite interest in using cross-lingual knowledge to learn word embeddings for various tasks, a systematic comparison of the possible approaches is lacking in the literature. We perform an extensive evaluation of four popular approaches of inducing cross-lingual embeddings, each requiring a different form of supervision, on four typologically different language pairs. Our evaluation setup spans...
متن کاملA Variational Autoencoding Approach for Inducing Cross-lingual Word Embeddings
Cross-language learning allows one to use training data from one language to build models for another language. Many traditional approaches require word-level alignment sentences from parallel corpora, in this paper we define a general bilingual training objective function requiring sentence level parallel corpus only. We propose a variational autoencoding approach for training bilingual word e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Neural Computing and Applications
سال: 2022
ISSN: ['0941-0643', '1433-3058']
DOI: https://doi.org/10.1007/s00521-022-07164-x